-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify string isAllWhitespace implementation #235
Simplify string isAllWhitespace implementation #235
Conversation
(Hope you don't mind a drive by nerd snipe.) I took the liberty of doing some benchmarking here. Each case was 10,000 loops checking extension String {
func isAllWhitespace() -> Bool {
trimmingCharacters(in: .whitespacesAndNewlines) == ""
}
// func isAllWhitespace() -> Bool {
// allSatisfy { $0.isNewline || $0.isWhitespace }
// }
}
let allWhitespace = String(repeating: " ", count: 10_000)
//let noWhitespace = String(repeating: "a", count: 10_000)
//let mixed = String(repeating: " ", count: 2500) + String(repeating: "a", count: 5000) + String(repeating: " ", count: 2500)
var results: [Bool] = []
results.reserveCapacity(10_000)
for _ in 0..<10_000 {
results.append(allWhitespace.isAllWhitespace())
}
precondition(results.allSatisfy { $0 == true }) Here are the results.
This isn't exhaustive, but as you can see You can also take a look at the rough source of In the end, depending on typical use, this may be a useful change for small strings of that usually not whitespace, but I'm not sure the difference on the small end is work the larger difference on the high end. That's up to the project. |
I wonder if benchmarks could show the same performance difference on Linux and Windows, since closed-source Foundation on Mac may behave in unexpected ways. Also, if we could infer some typical length of strings where one method is beneficial over the other, maybe we could keep both applying them conditionally? |
199737e
to
072c57d
Compare
Thank you @jshier for running benchmarks, as always perf hints can be really surprising when we measure it.
Changed to operate directly on unicode view using char set, seems to close the huge gap...
Then I think is just see if is worth for the project, as mixed also seems to be an improvement |
Did all of these benchmarks run in release mode? I still would love to see how it performs on Linux and Windows, but overall this is definitely something I'd consider merging 👍 |
It should be possible since this tool work in Linux and Windows as well, but unfortunately I don't have the environment.
Yeah, I just ran in with |
I wanted to set up some reproducible benchmarks that measure relative changes in performance between
Great, I assume |
I compiled a single standalone file, so using |
I've made some benchmarks using As you can see, I'll work on generating cases with different combinations of whitespace. |
Thank you for the benchmarks @jshier, my initial guess was exactly that avoid materialization of string for compare would be a perf win, but I guess the benchmarks show that it is not true. Have you tested cases with white space in the back, I think this would be a win for |
Back half is the same as just having no whitespace at all: both implementations early out, so there's a fixed difference between them, just like the early part of these graphs where the setup cost of |
Thanks for the detailed charts @jshier! What tools did you use to generate them and to collect benchmark data? I was going to use https://github.com/google/swift-benchmark for CI benchmarking, but I see you probably use something fancier? |
That is unexpected, because for a string with no whitespaces in front I would expect Here are the generators: // 50%
let inputGenerator: (Int) -> String = { size in
return String(repeating: "a", count: size / 2) + String(repeating: " ", count: size / 2)
}
// 25% 50%(non-whitespaces) 25%
let inputGenerator: (Int) -> String = { size in
return String(repeating: " ", count: size / 4) + String(repeating: "a", count: size / 2) + String(repeating: " ", count: size / 4)
} |
@MaxDesiatov That would be https://github.com/apple/swift-collections-benchmark, it does produce the results in a json format as well, so it probably can be used on a CI setting too :) |
Yes, I used @LucianoPAlmeida you're right, the backside benchmarks are more interesting as you show. In that case |
Here's my bencher if you want to play with it. |
Thank you @jshier for the benchmarks, really good overview the last one! |
LGTM, thank you both for the detailed investigation! |
This patch is really very simple one, but I believe can be both a simplification on implementation and a perf improvement in the sense that as it is if we take for example as reference the implementation of
trimmingCharacters
we note that it will do a lot extra work as allocating buffers or a whole new string maybe, when this only need a simple check that can exit early and also avoid those extra allocations.I'm not sure if is significant, because I cannot tell if this is used in hot paths within the project and also I didn't measure, so perf improvement is just a guess that makes sense to me, and since it was a simple change I thought in open the PR. So let me know if this makes sense :)